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ABSTRACT The human oral cavity is home to a large and diverse community of viruses that have yet to be characterized in pa- 
tients with periodontal disease. We recruited and sampled saliva and oral biofiim from a cohort of humans either periodontally 
healthy or with mild or significant periodontal disease to discern whether there are differences in viral communities that reflect 
their oral health status. We found communities of viruses inhabiting saliva and the subgingival and supragingival biofilms of 
each subject that were composed largely of bacteriophage. While there were homologous viruses common to different subjects 
and biogeographic sites, for most of the subjects, virome compositions were significantly associated with the oral sites from 
which they were derived. The largest distinctions between virome compositions were found when comparing the subgingival 
and supragingival biofilms to those of planktonic saliva. Differences in virome composition were significantly associated with 
oral health status for both subgingival and supragingival biofiim viruses but not for salivary viruses. Among the differences 
identified in virome compositions was a significant expansion of myoviruses in subgingival biofiim, suggesting that periodontal 
disease favors lytic phage. We also characterized the bacterial communities in each subject at each biogeographic site by using 
the V3 hypervariable segment of the 16S rRNA and did not identify distinctions between oral health and disease similar to those 
found in viral communities. The significantly altered ecology of viruses of oral biofiim in subjects with periodontal disease com- 
pared to that of relatively periodontally healthy ones suggests that viruses may serve as useful indicators of oral health status. 

IMPORTANCE Little is known about the role or the constituents of viruses as members of the human microbiome. We investigated 
the composition of human oral viral communities in a group of relatively periodontally healthy subjects or significant periodon- 
titis to determine whether health status may be associated with differences in viruses. We found that most of the viruses present 
were predators of bacteria. The viruses inhabiting dental plaque were significantly different on the basis of oral health status, 
while those present in saliva were not. Dental plaque viruses in periodontitis were predicted to be significantly more likely to kill 
their bacterial hosts than those found in healthy mouths. Because oral diseases such as periodontitis have been shown to have 
altered bacterial communities, we believe that viruses and their role as drivers of ecosystem diversity are important contributors 
to the human oral microbiome in health and disease states. 
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We are in the early stages of understanding the tremendous 
diversity harbored within the human microbiome and its 
significant role in human health. Viruses inhabiting human body 
surfaces may be key factors in shaping human microbial ecology 
( 1-7) , but the potential role of viral communities in human health 
and disease remains largely unexplored (2, 8-10). Microbial com- 
munities can now be studied in greater detail because of the in- 
creased accessibility of sequencing technology and improved an- 
alytical capabilities (11). There have been numerous studies of the 
bacterial communities inhabiting human body surfaces, such as 
the skin (12, 13) and the gastrointestinal (14, 15), respiratory (16, 
17), and genitourinary tracts (18-20) but fewer studies of the viral 
communities inhabiting these sites. While studies of viral commu- 
nities have shown that viruses on human body surfaces are diverse 
(5-7, 21), studies have yet to illuminate how viral community 
diversity and membership pertain to human health and disease. 



Periodontitis is a highly prevalent oral disease among adults 
(22) that results from inflammation of the supporting structures 
of the teeth. Some have hypothesized that the disease is caused by 
the host immune response to the presence of specific pathogens 
(23-27). Historically, microbiological aspects of periodontal dis- 
ease and dental caries have been studied by culture-based or PCR/ 
hybridization-based methods to detect pathogens collected from 
subgingival plaque. Bacterial species such as Porphyromonas gin- 
givalis (28), Tatmerella forsythia (29), Aggregatibacter actinomyce- 
temcomitans (30), Streptococcus mutatis (31), and Treponema den- 
ticola (32) have been implicated as etiological agents of 
periodontal disease by using these or similar methods (33-35). 
Herpesviruses such as herpes simplex virus 1 (HSV-1), cytomeg- 
alovirus (CMV), and Epstein-Barr virus (EBV) have also been 
looked at in association with periodontitis (36-41). Certain stud- 
ies have shown an increased presence of HSV-1, CMV, and/or 
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TABLE 1 Characteristics of study subjects 



Status and subject 


Age 

lyi) 


Ethnicity 


Sex 


Diagnosis 


Smoking 


Comorbidity, other 
information 


Significant periodontal disease 














Dl 


73 


Caucasian 


Male 


Chronic severe generalized 


Yes 


None 










periodontitis 






D2 


66 


Caucasian 


Male 


Chronic severe generalized 


Yes 


Hypertension 










periodontitis 






D3 


62 


Caucasian 


Female 


Chronic moderate generalized 


No 


Hypothyroidism, vegetarian 










periodontitis 






D4 


69 


Caucasian 


Female 


Chronic severe generalized 


No 


None 










periodontitis 






D5 


48 


Caucasian 


Female 


Chronic moderate generalized 


Yes 


None 










periodontitis 






D6 


73 


Caucasian 


Male 


Chronic severe generalized 


Yes 


None 










periodontitis 






D7 


27 


Asian 


Male 


Generalized aggressive mild 


No 


Amoxicillin in prior 3 mo 










periodontitis 






Healthy or with mild 














periodontal disease 














HI 


25 


Asian 


Male 


Healthy 


No 


None 


H2 


25 


Caucasian 


Male 


Healthy 


No 


None 


Hi 


51 


Hispanic 


Female 


Moderate gingivitis'* 


No 


Diabetes 


H4 


34 


African American 


Male 


Healthy 


No 


None 


H5 


32 


Asian 


Female 


Healthy 


No 


Diabetes 


H6 


24 


Asian 


Female 


Healthy 


No 


None 


H7 


27 


Caucasian 


Male 


Healthy 


No 


None 


H8 


44 


Caucasian 


Female 


Chronic mild generalized 


No 


None 










periodontitis'' 






H9 


50 


Hispanic 


Female 


Chronic mild generalized 


Yes 


None 



periodontitis 6 



rt Has signs of gingival inflammation but no periodontal attachment loss or bone loss. Gingivitis affects about half of U.S. adults. 
b Has signs of gingival inflammation and, at most, 2-mm attachment loss. Nine percent of U.S. adults have this condition. 



EBV in subgingival plaque at sites of periodontitis (36, 39, 40), 
while others studies have shown no association (38, 42-44). Now, 
as our paradigms for understanding the interconnection between 
microbes and human health change, much of the study of mi- 
crobes in periodontal disease has shifted toward studying commu- 
nities rather than individual pathogens (45-50). Rather than ver- 
ification of the presence of a few viruses present in periodontitis, 
we are interested in the broader dynamics of the communities of 
organisms present on and interacting with the gingiva and associ- 
ated tissues. 

The oral cavity contains multiple soft and hard tissue surfaces 
creating diverse niches that harbor a wide range of microbiota, 
including > 1,000 different bacterial taxa (51). While herpesvi- 
ruses may be present in the oral cavity, there is a much larger 
population of viruses present, the majority of which are bacterio- 
phage (6,21). Many of these phage belong to the Caudovirus fam- 
ilies Siphoviridae (generally lysogenic with intermediate host 
ranges), Myoviridae (typically lytic with relatively broad host 
ranges), and Podoviridae (typically lytic with relatively narrow 
host ranges) (52, 53). Communities of oral viruses are highly per- 
sonalized and vary according to host sex (6, 54). Unrelated house- 
hold contacts share significant proportions of their viromes, 
which suggests that substantial environmental influences affect 
oral viral ecology (21). Because oral viruses have been shown to 
elicit host immune responses, they could potentially play a role in 
shaping oral immunity and disease pathogenesis (10, 54). In ad- 
dition, we previously demonstrated that human oral viruses carry 



substantial gene functions that may be involved in the pathogenic 
functions of their host bacteria (6), which suggests a more subtle 
role for viruses as members of the human oral microbiome. 

In this study, we investigated oral viral community member- 
ship in a cohort of periodontally healthy subjects and those with 
disease. We examined viruses from planktonic saliva and from 
subgingival and supragingival biofilms to determine whether 
there are significant differences in viral community membership 
by oral biogeographic site and to determine whether the ecology of 
human oral viral communities reflects oral health status. 

RESULTS 

Human subjects and isolation of viruses. We recruited 1 6 human 
subjects and sampled their saliva, subgingival plaque, and suprag- 
ingival plaque. Seven of the 16 subjects had periodontal disease, 
while the other 9 had good overall periodontal health (Table 1). 
Because of the relatively low biomass at each tooth, we pooled the 
subgingival or supragingival plaque to have sufficient biomass to 
examine each subject individually. 

We isolated viruses from the saliva, subgingival plaque, and 
supragingival plaque of each subject for a total of 48 separate vi- 
romes. DNA viruses were enriched according to our previously 
described protocols (6) by CsCl density gradient ultracentrifuga- 
tion, followed by extraction of DNA from intact virions. All vi- 
romes were sequenced by semiconductor sequencing (55) for a 
total of 29,076,203 reads after processing (10,386,122 from saliva, 
8,659,122 from subgingival plaque, and 10,030,956 from supra- 
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FIG 1 Bar graphs of the mean percentages of contigs ( ± the standard errors) with viral homologues in the NR database from all of the subjects. Panels: A, saliva; 
B, subgingival plaque; C, supragingival plaque. 



gingival plaque) with a mean length of 146 nucleotides (nt). We 
sequenced an average of 1,817,263 reads per subject and 605,754 
per individual virome. Each virome was screened for the presence 
of contaminating cellular nucleic acids by BLASTN analysis 
(E score, <10 -5 ) against a composite database of 16S rRNA 
and the Human Reference Genome database (ftp:// 
ftp.ncbi.nlm.nih.gov/genomes/H_sapiens/). None of the suprag- 
ingival or subgingival plaque viromes had any identifiable homo- 
logues to 16S rRNA, and only 3 of the 16 salivary viromes had 
single identifiable 16S rRNA homologues (see Table SI in the sup- 
plemental material), indicating that each of the viromes was rela- 
tively free of contaminating bacterial nucleic acids. While we did 
identify homologues to human DNA in almost all viromes, the 
mean percentage was low (mean, 0.38%; range, 0.00 to 7. 15%). All 
homologues to human DNA were removed prior to further anal- 
ysis. 

Examination of viruses in saliva and oral biofilm. Because 
longer contigs are more likely to generate productive BLAST 
searches, we assembled the reads for all viromes in each subject. 
The mean number of contigs per subject was 3,468 (763 for saliva, 
1,201 for subgingival plaque, and 1,504 for supragingival plaque), 
with a mean length of 1,041 nt (955 for saliva, 1,105 for subgingi- 
val plaque, and 1,061 for supragingival plaque). The median 
length of all contigs was 509 nt (491 for saliva, 516 for subgingival 
plaque, and 513 for supragingival plaque) (see Fig. S1A in the 



supplemental material), and the median GC content was 44% 
(44% for saliva, 43% for subgingival plaque, and 44% for suprag- 
ingival plaque) (see Fig. SIB). We subjected all contigs to BLASTX 
analysis against the NCBI Nonredundant (NR) database to iden- 
tify homologous sequences. Similar to prior studies, there was a 
limited number of contigs that had identifiable homologues. A 
significantly higher percentage of salivary contigs than contigs de- 
rived from plaque had identifiable viral homologues (40.09% of 
those from saliva, 27.87% of those from subgingival plaque, and 
29.09% of those from supragingival plaque; P < 0.0001) (Fig. 1). 
The substantial differences in the structural genes identified in 
contigs from each biogeographic site (40.7% of the contigs from 
saliva, 12.4% of those from subgingival plaque, and 16.3% of those 
from supragingival plaque) account for much of the observed dif- 
ferences. These differences are likely explained by a lack of viruses 
present in the NR database that are homologous to subgingival 
and supragingival microbiota. Similar differences were not ob- 
served for homologues involved in viral replication and integra- 
tion, where 37.0% of the contigs from saliva had these homo- 
logues, 43.8% of those from subgingival plaque had them, and 
43.3% of those from supragingival plaque had them (Fig. 1). 

Personalized oral viruses by biogeographic site. We com- 
pared the oral viruses in each subject by biogeographic site (saliva 
versus subgingival plaque versus supragingival plaque) to deter- 
mine whether there were identifiable viruses specific to each sub- 
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TABLE 2 Subgingival and supragingival virome homologues within and between subjects 



% Homology in subgingival plaque % Homology in supragingival plaque 



Subject 


Intrasubject" 


Intersubject" 


P value*" 


Intrasubject" 


Intersubject" 


P value'' 


Dl 


91.46 


± 3.19 


79.89 ± 


4.97 


0.0083 


90.97 


-h 


3.07 


84.39 ± 


3.61 


0.0479 


D2 


92.06 


± 3.21 


67.56 ± 


6.67 


0.0005 


92.26 


-h 


2.43 


43.76 ± 


9.61 


< 0.0001 


D3 


87.58 


± 2.90 


48.75 ± 


7.15 


<0.0001 


81.49 


-h 


3.36 


63.97 ± 


4.29 


0.0009 


D4 


87.11 


± 3.13 


65.79 ± 


5.22 


0.0004 


84.55 


-h 


2.34 


62.48 ± 


3.50 


< 0.0001 


D5 


88.53 


± 4.25 


84.25 ± 


3.81 


0.1913 


88.58 


-h 


4.23 


73.53 ± 


5.48 


0.0114 


D6 


90.60 


± 3.19 


76.35 ± 


4.80 


0.0058 


92.37 


-h 


3.35 


79.36 ± 


5.58 


0.0260 


D7 


88.84 


± 3.69 


76.13 ± 


4.81 


0.0186 


84.40 


-h 


3.24 


75.19 ± 


3.42 


0.0255 


HI 


87.23 


± 2.64 


65.70 ± 


5.62 


0.0001 


87.31 


-h 


2.66 


68.98 ± 


3.99 


0.0005 


H2 


84.52 


± 2.90 


74.00 + 


3.66 


0.0056 


84.51 


-h 


3.42 


79.77 ± 


3.09 


0.0024 


H3 


84.80 


± 3.35 


66.25 ± 


4.43 


0.0019 


86.52 


-h 


3.72 


82.41 ± 


3.02 


0.0043 


H4 


89.29 


± 2.39 


80.49 ± 


3.19 


0.0006 


89.72 


-h 


2.11 


77.16 ± 


3.41 


0.0125 


H5 


86.81 


± 2.76 


82.42 ± 


2.98 


0.0626 


93.90 


-h 


1.91 


80.54 ± 


4.21 


0.0001 


H6 


92.62 


± 1.95 


80.09 ± 


4.16 


0.0001 


85.93 


-h 


2.74 


78.82 ± 


3.04 


< 0.0001 


H7 


91.42 


± 2.46 


78.12 ± 


4.39 


0.0005 


89.86 




2.64 


78.52 ± 


4.22 


0.1747 


H8 


91.04 


± 2.23 


66.98 ± 


5.06 


<0.0001 


86.71 


+ 


3.05 


73.74 ± 


3.51 


0.1381 


H9 


84.94 


± 2.84 


80.72 ± 


2.62 


0.1008 


87.32 


-h 


2.61 


72.52 ± 


4.17 


< 0.0001 



a Based on the mean of 10,000 iterations. One thousand random contigs were sampled per iteration. 

b Empirical P value based on the fraction of times the estimated percent homologous contigs for each subject exceeds that for different subjects. Values that indicate statistical 
significance are in bold. 



jectby site. We used BLASTN analysis (E score, <10 -10 ) to quan- 
tify the numbers and patterns of shared homologous contigs 
among all 16 subjects at each site. There were numerous homol- 
ogous viruses among the saliva samples from many subjects, par- 
ticularly among healthy subjects H5 to H8 (see Fig. S2A in the 
supplemental material), as well as among the subgingival and su- 
pragingival plaque samples from all of the subjects (see Fig. S2C 
and E, respectively). However, the pattern of homologous viruses 
observed in heat maps also suggested that despite some shared 
viruses, many oral viruses were specific to both subgingival and 
supragingival plaque samples from an individual. This pattern 
suggested that many viruses of oral biofilm were unique to each 
individual. We also performed global assemblies from the reads 
from all of the subjects by biogeographic site and found similar 
patterns in saliva (see Fig. S2B) and subgingival (see Fig. S2D) and 
supragingival plaque (see Fig. S2F) samples that suggested that 
many of the oral viruses were specific to individuals. 

We performed permutation tests in which we randomly sam- 
pled the virome contigs to measure the proportions of intrasubject 
and intersubject homologous viruses to determine whether the 
viral ecology at each biogeographic site was significantly individ- 
ual specific. For subgingival plaque, we found that viromes dem- 
onstrated significant levels of individual-specific contigs for 14 of 
the 16subjects (P< 0.05) andin supragingival plaque for 13 ofthe 
16 subjects (P < 0.05) (Table 2). In subgingival plaque from sub- 
jects with periodontal disease, 87.8% ± 4.3% of the contigs had 
intrasubject homologues and 69.0% ± 13.6% had intersubject 
homologues. Similar results were found for periodontally healthy 
subjects or those with mild disease, with 88.0% ± 2.8% of the 
contigs with intrasubject homologues and 77.0% ± 4.3% with 
intersubject homologues. In supragingival plaque from the signif- 
icant periodontal disease group, 89.5% ± 1 .9% of the viral contigs 
had intrasubject homologues and 71.2% ± 11.9% had intersub- 
ject homologues; in that from periodontally healthy subjects, 
88.1% ± 3.1% of the viral contigs had intrasubject homologues 
and 75.0% ± 6.9% had intersubject homologues. Similar results 
were also found for globally assembled contigs from all of the 



subjects by biogeographic site, where significant levels of 
individual-specific contigs were identified for all 16 subjects in 
subgingival plaque (P < 0.05; see Table S2 in the supplemental 
material) and for 14 of the 16 subjects in supragingival plaque (P 

< 0.05). These data indicate that there is a significant association 
between individual subjects and their biofilm viral community 
membership, regardless of oral health status. 

Viromes from saliva from all of our periodontally healthy sub- 
jects or those with mild disease demonstrated significantly higher 
levels of intrasubject shared homologues than intersubject shared 
homologues (mean, 94.8% ± 2.6% compared to 78.3% ± 5.3% [P 

< 0.05] for all healthy subjects) (see Table S3 in the supplemental 
material). For subjects with significant periodontal disease, the 
proportions of intrasubject and intersubject shared homologues 
in saliva were not significantly different for four of the seven sub- 
jects (mean for all of the subjects, 94.2% ± 1.7% compared to 
87.4% ± 6.8%). For globally constructed assemblies, significant 
levels of individual-specific contigs were found in saliva from all 
16 subjects (P < 0.05; see Table S4 in the supplemental material). 

Biogeographic differences among viral communities. We 
compared the virome contigs across all of the subjects by biogeo- 
graphic site to determine whether there were significant propor- 
tions of virome contigs specific to each site. We did not identify 
any significant associations with biogeographic sites among the 
viruses within saliva or among the viruses within biofilms (7.8% ± 
12.0% versus 5.1% ± 7.8%; P = 0.258) (Table 3). In contrast, we 
found highly significant associations by biogeographic site among 
the viruses in subgingival plaque compared to other sites (37.5% 
± 9.0% versus 10.6% ± 8.7%; P = 0.0122) and supragingival 
plaque compared to other sites (34.5% ± 7.0% versus 10.5% ± 
8.6%; P = 0.0087). These data indicate that the biogeographic site 
may be an important determinant of oral viral ecology. We found 
similar trends in the data when examining globally constructed 
assemblies from all reads and biogeographic sites; however, these 
differences were not statistically significant (see Table S5 in the 
supplemental material). 
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TABLE 3 Viral homologues and shared 16S rRNA OTUs within and between subject groups 



% Homology in virome 



% Homology in 16S rRNA 



Site and/or status 


Within group* 1 


Between groups" 


P value'' 


Within j 


*roun a 


Between groups" 


P value'' 


Sites 


















Saliva 


7.81 ± 12.03 


5.07 ± 7.78 


0.2584 


47.50 




34.17 


11.58 ± 25.04 


0.1671 


Subgingival plaque 


37.53 ± 8.98 


10.60 ± 8.70 


0.0122 


73.01 




16.96 


40.76 ± 35.83 


0.2201 


Supragingival plaque 


i A AC -t- n nn 

34.45 it o.yy 


10.50 It 8.62 


/i unit"? 
0.0087 


74.29 


+ 


16.43 


Af\ 1f\ -±- 1 £ 

4U.3U It 36.16 


0.2143 


Health status 


















Saliva 


















Healthy 


7.06 ± 9.67 


1.06 ± 2.68 


0.1090 


34.08 




37.82 


36.01 ± 32.96 


0.3416 


Periodontal disease 


12.43 ± 12.40 


1.07 ± 2.61 


0.1254 


70.11 


+ 


12.74 


34.02 ± 32.82 


0.1464 


Subgingival plaque 


















Healthy 


38.63 ± 6.40 


13.56 ± 5.64 


< 0.0001 


73.41 




15.39 


71.15 ± 15.83 


0.4382 


Periodontal disease 


37.16 ± 10.86 


13.59 ± 5.81 


0.0221 


77.23 




11.93 


70.67 ± 15.02 


0.3474 


Supragingival plaque 


















Healthy 


31.25 ± 5.91 


12.18 ± 4.11 


0.0009 


62.27 


+ 


34.50 


65.23 ± 27.16 


0.3862 


Periodontal disease 


34.66 ± 5.64 


12.11 ± 4.05 


0.0018 


74.20 


+ 


13.26 


64.44 ± 27.66 


0.4181 



" Based on the mean of 10,000 iterations. One thousand random contigs were sampled per iteration. 

b Empirical P value based on the fraction of times the estimated percent homologous contigs or shared OTUs for each group exceeded that between groups. Values that indicate 
statistical significance are in bold. 



Comparison of viruses in orally healthy persons with those 
in persons with periodontal disease. We next compared the oral 
viruses of relatively periodontally healthy subjects with those of 
subjects with significant disease to determine whether viral com- 
munity composition might be associated with oral health status. 
We performed principal-coordinate analysis (PCOA) to compare 
the patterns of variation in shared homologues across all of the 
subjects and biogeographic sites. Many of the salivary viromes 
were similarly clustered on the basis of host disease status 
(Fig. 2A). A similar trend was also identified in oral biofilm, where 
most of the subgingival and supragingival viromes from subjects 
with significant periodontal disease were similarly clustered 
(Fig. 2B). We next quantified the proportion of homologous con- 
tigs across viromes of relatively periodontally healthy subjects or 
disease to determine whether the patterns of variation observed in 
PCOA were supported statistically. For saliva, the proportion of 
homologous virome contigs was greater for comparisons among 
subjects with significant periodontal disease (12.4% ± 12.4%) 
than between subjects with different oral health status (1.1% ± 



2.6%), but this difference was not statistically significant (Ta- 
ble 3). The proportion of shared virome homologues was much 
greater among subjects with significant periodontal disease than 
in comparisons between subjects with different oral health status 
for subgingival plaque (37.2 ± 10.9% versus 13.6 ± 5.8%; P = 
0.022) and supragingival plaque (34.7 ± 5.6% versus 12.1 ± 4.1%; 
P = 0.002) (Table 3). These data indicate that human oral viral 
ecology in oral biofilm is significantly associated with oral health 
status. While similar trends were found in the data for globally 
constructed assemblies, the data were not significant for any bio- 
geographic site (see Table S5 in the supplemental material). 

We characterized the virus families in all of the subjects to 
determine whether significant differences existed in periodontally 
healthy subjects and those with disease by biogeographic site. In 
samples from each site in orally healthy subjects, members of the 
family Siphoviridae were the most abundant virus types (Fig. 3), 
and their generally lysogenic lifestyle suggests that lysogeny is the 
preferred state of viruses in orally healthy subjects. In addition, the 
abundance of podoviruses from each site was similar but the rel- 
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FIG 2 PCOA of beta diversity present in the viromes of each subject at each biogeographic site. Relatively periodontally healthy subjects are represented in white, 
and subjects with significant periodontal disease are represented in black. Circles represent saliva, squares represent subgingival plaque, and triangles represent 
supragingival plaque. Panels: A, saliva; B, subgingival and supragingival plaque. 
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■ Siphoviridae □Myoviridae DPodoviridae BUnclassified 

FIG 3 Pie charts of bacteriophage families present in the saliva (A) and subgingival (B) or supragingival plaque (C) samples of relatively periodontally healthy 
subjects (left) and those with disease (right). Asterisks indicate significant differences (P < 0.05) between the proportions of virus families identified in 
periodontally healthy subjects and those with disease. 



ative abundance of myoviruses varied considerably by site and 
disease state. Myoviruses were significantly more abundant in sa- 
liva from healthy individuals (Fig. 3A), though siphoviruses re- 
mained the most abundant viral type in healthy subjects and in 
those with disease. In subgingival plaque, however, we found 
myoviruses to be significantly more abundant in those with peri- 
odontal disease than in healthy subjects (Fig. 3B). Myoviruses are 
generally lytic, and their predominance in subjects with periodon- 
tal disease suggests a more active role for viruses in driving bacte- 
rial diversity in the subgingival crevice. The myoviruses in subgin- 
gival plaque from subjects with periodontal disease were even 
more abundant than siphoviruses, which was not observed in any 
other oral microenvironment studied. No significant differences 
were identified in the virus families found in supragingival plaque 
(Fig. 3C). Thus, virome membership was significantly altered in 
subjects with periodontal disease, predominantly as a result of the 
increased abundance of myoviruses in their subgingival plaque. 

Characterization of bacterial communities. On the basis of 
our findings that there were significant associations between oral 
viruses and individual subjects (see Fig. S2 in the supplemental 
material; Table 2), biogeographic sites (Table 3), and oral health 
status (Table 3), we investigated whether similar trends might also 



be identifiable for oral bacterial communities. We examined these 
communities on the basis of the V3 hypervariable region of the 
16S rRNA. We sequenced 880,410 16S rRNA reads after process- 
ing, for a mean of 55,026 reads per subject and 18,342 reads per 
biogeographic site. We performed rarefaction analyses, which 
demonstrated that most of the diversity present in each subject 
had been adequately sampled for saliva (see Fig. S3A in the sup- 
plemental material) and subgingival (see Fig. S3B) and supragin- 
gival plaque (see Fig. S3C). Diversity of bacterial communities was 
estimated by using the Shannon diversity index (H'). In the sub- 
gingival and supragingival plaque samples from most of our sub- 
jects, the estimated species diversity was greater in relatively orally 
healthy subjects (IT, 6.5) than in those with significant periodon- 
tal disease (H', 5.3). When the data for all of the subjects and 
biogeographic sites by oral health status were combined, the esti- 
mated species diversity in relatively orally healthy subjects ex- 
ceeded that in subjects with periodontal disease (see Fig. S3D; H', 
7.0 in healthy subjects versus 6.4 in those with disease). Species 
richness in saliva exceeded that in both subgingival and supragin- 
gival plaque, regardless of oral health status (see Fig. S3E). 

We used PCOA to examine the patterns of variation observed 
in oral bacterial communities by biogeographic site. We found 
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FIG 4 PCOA of beta diversity present in the bacterial communities of each subject at each biogeographic site. Panels: A, classified by biogeographic site; 
classified by oral health status. 



variation differentiating the salivary microbiota from those of oral 
biofilm (Fig. 4A); however, there was little distinction between the 
subgingival and supragingival biota. We also quantified the pro- 
portion of shared bacterial operational taxonomic units (OTUs) 
by biogeographic site in all of our subjects to determine whether 
there were statistically significant differences in the biota at each 
site. The proportion of the biota shared within each biogeographic 
site was greater than that shared by different sites (47.5% versus 
11.6% for saliva, 73.0% versus 40.8% for subgingival plaque, and 
74.3% versus 40.3% for supragingival plaque), but none of these 
differences was statistically significant (Table 3). We found a dis- 
tinct variation in the viromes of relatively orally healthy subjects 
and those with significant periodontal disease (Fig. 2), but similar 
significant trends could not be identified for the bacterial biota 
(Fig. 4B and Table 3). There was a significant association between 
virome constituents and oral biogeographic sites (Table 3), but no 
significant associations were identified in the bacterial biota (see 
Tables S6 and S7 in the supplemental material). 

Taxonomic compositions of viral and bacterial communi- 
ties. We compared the putative taxonomic compositions of viral 
communities across each biogeographic site to determine whether 
there were identifiable differences at a high taxonomic level. We 
found that viruses of Firmicutes and Proteobacteria were highly 
prevalent in saliva, followed by those of Bacteroidetes and Actino- 
bacteria (Fig. 5A). In contrast, viruses of Proteobacteria and Bacte- 
roidetes predominated in oral biofilm, with those of Firmicutes and 
Actinobacteria representing only a small minority of the viruses 
identified (Fig. 5B and C). The relative proportions of Firmicutes, 
Actinobacteria, and Bacteroidetes phage were all significantly dif- 
ferent (P < 0.05) when the taxonomic compositions of viral com- 
munities in saliva and oral biofilm were compared. Many of these 
differences were also reflected in the taxonomies of the bacterial 
communities, where Firmicutes predominated in saliva, com- 
pared to a relatively even distribution of Firmicutes, Actinobacte- 
ria, and Bacteroidetes in oral biofilm (Fig. 5D to F). 

We also examined the taxonomic composition of the viral 
communities in relatively periodontally healthy subjects with that 
of the communities in subjects with significant disease. No signif- 
icant differences were identified in subgingival plaque from peri- 
odontally healthy subjects and those with disease, despite the pre- 
ponderance of myo viruses in subjects with disease. This suggests 
that the significant differences in viral communities may be more 



closely related to the virus families present than to their putative 
bacterial hosts (Fig. 3B). In saliva, there was a trend to lower levels 
of Firmicutes and Actinobacteria in orally healthy subjects than in 
those with disease, although only the difference in Firmicutes lev- 
els was statistically significant (Fig. 5A). In supragingival plaque, 
there was a significant decrease in the proportion of Bacteroidetes 
and an increase an Actinobacteria phage in subjects with periodon- 
tal disease (Fig. 5C), but the biological basis of these findings in 
saliva and supragingival plaque is unclear. 

DISCUSSION 

Human body surfaces are inhabited by endemic viral communi- 
ties whose role on human body surfaces has not been thoroughly 
investigated. We previously demonstrated that there are robust 
communities of viruses that inhabit human saliva (6) and that 
many of these viruses are highly persistent over time (54). Because 
many of these persistent oral viruses are bacteriophage, they could 
play a role in determining oral bacterial community membership. 
We investigated the membership of the oral viral community in 
relatively periodontally healthy subjects and those with significant 
disease to determine whether there might exist ecological distinc- 
tions in the viral populations reflected in oral health status. We 
examined the viruses present in planktonic saliva and in subgin- 
gival and supragingival biofilms to provide a broad overview of 
different oral ecological niches. Our finding that membership in 
the biofilm virome is significantly associated with periodontal 
health and disease is the first statistically supported evidence that 
oral viral community membership is associated with a human 
disease condition. 

Bacteriophage are significant drivers of bacterial diversity in a 
variety of ecosystems (1-7), and most of the viruses identified in 
saliva and dental plaque were phage (Fig. 3) . In this study, many of 
the phage encountered at all oral sites were predicted to be sipho- 
viruses, which generally have lysogenic lifestyles by integrating 
into their host genomes. Lysogenic oral viruses live in dynamic 
equilibrium with their cellular hosts and, as a result, are highly 
persistent members of the human oral microbiome (54). Myovi- 
ruses are often lytic and, because of their increased virulence for 
their host bacteria, may have a great impact upon local bacterial 
diversity. Myoviruses were highly prevalent in the subgingival 
crevice in subjects with periodontal disease (Fig. 3B), suggesting 
an expanded role in driving bacterial diversity in oral biofilm. In 
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□ Periodontal Health ^Periodontal Disease 

FIG 5 Bar plots (means ± standard errors of the means) of putative viral host taxonomy (A and C) and bacterial taxonomy based on 16S rRNA (D to F) at the 
phylum level. Each phylum is shown on the x axis, and the percentage of contigs or OTUs identified belonging to the observed phyla is shown on the y axis. Panels 
A and D represent saliva, panels B and E represent subgingival plaque, and panels C and F represent supragingival plaque. White bars represent relatively 
periodontally healthy subjects, and black bars represents subjects with significant periodontal disease. Asterisks indicate significant differences (P £ 0.05) 
between phyla in periodontally healthy subjects and those with disease. 



the development of periodontal disease, the surfaces of the gums 
and bones pull away from the teeth, forming pockets that are 
generally inhabited by different bacteria. While the profound dif- 
ferences in the subgingival virobiota may merely reflect changes in 
the bacterial biota that colonize these exposed pockets, the differ- 
ences in the subgingival plaque virome in subjects with periodon- 
tal disease may also have other biological implications. The lytic 
phage in the subgingival crevice likely help to shape the local mi- 
crobiota and contribute to the local microbial community struc- 
ture and alter local biodiversity. 

There were significant differences between the putative bacte- 
rial hosts of viruses in saliva and those of viruses in oral biofilm 
when examined at a high taxonomic level (Fig. 5). Similar taxo- 
nomic differences were not observed when periodontally healthy 
subjects were compared with those with disease, suggesting that 
much of the observed differences in our subjects may have been 
due to changes in the relative abundance of virus families rather 
than their bacterial hosts. Because of their substantial coevolution 
with their bacterial hosts, culture-independent methods based on 
patterns of nucleotide usage are relatively accurate in predicting 



the hosts of lysogenic viruses at the genus level (56). However, 
techniques used to predict bacterial hosts are generally less suc- 
cessful at predicting the hosts of lytic viruses; thus, we opted to 
characterize lytic viruses only at a high taxonomic level in this 
study to reduce the potential for inaccuracy. 

Our data show a strong link between oral viruses and peri- 
odontal health (Table 3), biogeographic sites (Table 3), and indi- 
vidual subjects (Table 2) but likely only partially describe the ro- 
bust communities of viruses that inhabit the human oral cavity. 
Our methods and sequencing depth were designed to characterize 
the most abundant oral viruses (6) and likely offer limited insight 
into the less-abundant members of the human oral virome. This 
analysis included only DNA viruses, and it remains unclear 
whether the mouth is also inhabited by RNA viral communities. 
Because of the limited starting volume of saliva and the relatively 
scarce quantities of DNA recovered, we used multiple- 
displacement amplification (MDA) before sequencing each vi- 
rome. MDA may introduce biases into sequence data, particularly 
when relatively small quantities of starting nucleic acids are used 
(57). There are no available collections of non-MDA-treated oral 
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viromes for comparison; however, substantial systematic and di- 
rectional amplification biases would be necessary to reproduce the 
statistically significant trends found in these viromes. We used two 
separate techniques to assemble the sequence data, and they 
showed similar trends in individual-specific, biogeographic site- 
specific, and oral health status -specific viruses in each subject or 
subject group. We preferred to create the initial assemblies from 
each subject rather than use global assemblies from all of the sub- 
jects to reduce the potential for chimerism. Although the two 
methods showed similar trends, only the method that uses assem- 
blies from individual subjects produced statistically significant re- 
sults (Tables 2 and 3). 

While this study demonstrated a significantly altered oral viral 
ecology in subjects with periodontal disease, other studies have 
investigated viral community constituents in other disease condi- 
tions. There are conserved viral genotypes in the respiratory tracts 
of human subjects with cystic fibrosis (2), which could potentially 
be attributable to shared bacterial ecology. Our data, however, 
indicate a high level of commonality among bacterial species rep- 
resentation between relatively periodontally healthy subjects and 
those with more advanced disease but still showed that viruses 
were significantly associated with oral health status. A study of gut 
microbial communities in persons with Crohn's disease showed 
lower viral and bacterial diversity than in controls (58); however, 
both could potentially have been explained by antibiotic admin- 
istration. Only one healthy subject in this study received any an- 
tibiotics (Table 1), and thus, antibiotics cannot explain the signif- 
icant differences between the viral communities found in 
periodontally healthy subjects and those found in subjects with 
disease (Table 3). A recent study demonstrated a strong associa- 
tion between (i) host immune status and the use of antiviral ther- 
apies and (ii) viral community constituents in human plasma (8), 
indicating that viral communities respond to drug-mediated per- 
turbations. The effects of such perturbations on human oral viral 
communities have yet to be reported on. 

Biogeographic differences in human microbial ecology have 
been noted for cellular microbiota inhabiting different body sur- 
faces. Our approach is the first to assess oral biogeographic differ- 
ences in the viral microbiota of the human microbiome. Because 
of the substantial biomass necessary to evaluate DNA viruses by 
our techniques, we had to pool plaque samples from individual 
teeth in order to have sufficient biomass to examine subgingival 
and supragingival viral communities. As expected, the viral 
(Fig. 3B) and bacterial (Fig. 5A) biota of subgingival and supra- 
gingival plaque samples had similarities; however, each was quite 
distinct from that of saliva. The difference in viral ecology between 
planktonic saliva and biofilm communities is likely attributable to 
differences in the ecology of the bacteria and archaea that inhabit 
each individual niche and the various lifestyles observed (Fig. 3). 
Biogeographic differences in the cellular microbiota may also ex- 
plain our relative inability to identify sequences homologous to 
many biofilm-derived viral structural genes (Fig. 1), as the biofilm 
had abundant bacteria belonging to Bacteroidetes and Actinobac- 
teria (Fig. 5D to F) relative to Firmicutes (highly represented in 
saliva) and the representation of viruses for these bacteria may be 
heterogeneous in the NR database. There are several factors that 
likely contribute to our finding significant associations between 
viruses, individual subjects, and oral health status, while not iden- 
tifying similar trends in the bacterial biota. First, the difference in 
the techniques used (metagenomics versus 16S rRNA amplicon 



sequencing) probably accounted for some of the differences ob- 
served, likely resulting in lower resolution for the detection of 
differences among the members of the bacterial biota. Second, our 
analysis of viral and bacterial community constituents was based 
on the relative abundance of community members rather than 
just their presence or absence. When taking into account only the 
presence or absence of taxa, both the virobiota and the bacterial 
biota showed significant subject specificity and associations with 
oral health status (data not shown). Lastly, many human subjects 
have bacterial species in common; however, the prophage in these 
genomes vary considerably (59, 60). The relative host specificity of 
these prophage, particularly when they are expressed as virions, 
may account for much of the subject specificity detailed in this 
report. Therefore, the relatively large proportion of lysogenic si- 
phoviruses identified in each subject and at each biogeographic 
site would contribute substantially to the subject specificity ob- 
served. 

The data produced in this study and those of other studies 
characterizing human viral community ecology together suggest 
that viral communities respond to perturbation (8) and environ- 
mental factors similar to their counterpart bacterial communities. 
We recently demonstrated that unrelated household contacts are 
significantly more likely to have oral viruses in common (61), 
suggesting that they may be exposed to viruses from the same 
environmental reservoir. Furthermore, oral viruses are signifi- 
cantly associated with the sex of their human host (54), suggesting 
that host factors such as hormones may play a formative role in 
human viral ecology. The reservoir of antibiotic resistance is ex- 
panded in the mouse gut virome in response to antibiotic pertur- 
bation ( 62 ). In gut viromes, diet plays an important formative role 
in the ecology of viral communities (3). While the nature of the 
perturbations that result in periodontal disease in humans may be 
variable, our data demonstrate that altered oral viral ecology is an 
associated feature of significant periodontitis. Viruses of bacteria 
have the potential to eradicate their hosts or to provide them with 
beneficial gene functions (6); thus, the predominantly lytic viruses 
in the subgingival crevice may have the capacity to shape the nat- 
ural history of the oral microbiome in persons with periodontal 
disease. While the oral microbiome has been hypothesized to play 
a role in the development of periodontal disease (63-65), the role 
of viruses in these microbial communities has not been elucidated. 
Because of their potential to shape human microbial communi- 
ties, as well as host immune responses, viruses likely also play an 
important role in human oral health status. 

MATERIALS AND METHODS 

Human subjects. Subject recruitment and enrollment were approved by 
the University of California, San Diego, and the Western University Ad- 
ministrative Panels on Human Subjects in Medical Research. All of the 
subjects signed informed-consent forms indicating their willingness to 
participate in this study. Each subject underwent a baseline periodontal 
examination, including measurements of probing depths, clinical attach- 
ment loss, gingival index, plaque index, and gingival irritation (66), and 
their oral health status was recorded (Table 1). Dental plaque was col- 
lected from subgingival and supragingival biofilm samples from teeth 3, 9, 
12, 19, 25, and 28 and placed into 200 fA of 0.02-jum-filtered phosphate- 
buffered saline (PBS; Fisher Scientific, Chico, CA). Approximately 3 ml of 
saliva was also collected from each subject without stimulation. All spec- 
imens were immediately frozen on dry ice and stored at — 20°C until use in 
this study. All of the subjects completed a questionnaire detailing their 
dietary habits and comorbidities. Exclusion criteria included preexisting 
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medical conditions that could result in substantial immunosuppression. 
All of the participating subjects were unrelated, and only one had received 
any antibiotics in the 3 months prior to the beginning of the study (Ta- 
ble 1). 

Isolation and sequencing of oral viruses Subgingival and supragingi- 
val plaque samples from each subject were pooled separately, washed 
twice in 0.02 micron filtered PBS, and spun at 6,000 X g for 10 min to 
pellet the biofilm. The biofilm was then incubated at 37°C for 30 min and 
vortexed vigorously for 10 min to separate out viruses. The biofilm was 
then spun at 6,000 X g for 1 0 min, and the supernatants were treated in the 
same manner as the saliva samples, by using previously described methods 
for enrichment and extraction of nucleic acids from viruses (6). Briefly, 
samples were filtered sequentially with 0.45- and 0.2-/xm filters (VWR, 
Radnor, PA) to remove cellular and other debris and purified on a cesium 
chloride gradient. Only the fraction with a density corresponding to most 
known bacteriophage (67) was retained, further purified on Amicon YM- 
100 protein purification columns (Millipore, Inc., Billerica, MA), treated 
with DNase I, and subjected to lysis and DNA purification with the Qiagen 
UltraSens virus kit (Qiagen, Valencia, CA). The resulting DNA was am- 
plified with the GenomiPhi V2 DNA amplification kit (GE Healthcare, 
Pittsburgh, PA), fragmented to roughly 200 to 400 bp with a Bioruptor 
(Diagenode, Denville, Nl), made into libraries with the Ion Plus Fragment 
Library kit (Life Technologies, Grand Island, NY) according to the man- 
ufacturer's instructions, and sequenced with 314 chips on an Ion Torrent 
Personal Genome Machine (PGM; Life Technologies, Grand Island, NY) 
(55). 

Processing and analysis of virome sequences. Sequencing reads were 
trimmed according to modified Phred scores of 0.5 with CLC Genomics 
Workbench 4.65 (CLC bio USA, Cambridge, MA), and low- complexity 
reads (where >25% of the length was due to homopolymer tracts) were 
removed prior to further analysis. After trimming and removal of low- 
complexity reads, any remaining reads with substantial length variation 
(<50 or >300 nt) or reads with ambiguous characters were removed 
prior to further analysis. Reads were screened for homology to a compos- 
ite 1 6S rRNA database including the Ribosomal Database Project database 
(68), the Green Genes database (69) and the Silva database (70) by 
BLASTN analysis with an E score cutoff value of 10~ 5 . Reads were also 
screened for homology to the Human Reference Database at ftp:// 
ftp.ncbi.nlm.nih.gov/genomes/H_sapiens/ by BLASTN analysis with an 
E score cutoff value of 10~ 5 . Any reads homologous to sequences in the 
human database were removed prior to further analysis. Reads then were 
assembled with CLC Genomics Workbench 4.65 (CLC bio, Cambridge, 
MA) to construct contigs based on 98% identity with a minimum of 50% 
read overlap, consistent with criteria developed to discriminate between 
highly related viruses (71). Because the shortest reads were 50 nt, the 
minimum tolerable overlap was 25 nt and the mean overlap was no less 
than 73 nt, depending on the characteristics of each virome. Any contigs 
of <200 nt or with ambiguous characters were removed prior to further 
analysis. Length and GC content variation among contigs were assessed by 
using box-and-whisker plots created with Microsoft Excel 2007 (Mi- 
crosoft Corp., Redmond, WA). 

We used BLASTX analysis against the NR database (E score cutoff, 
10~ 5 ) to find viral homologues to contigs from each subject and biogeo- 
graphic site. Homologous contigs were determined by parsing BLASTX 
results for known viral genes, including replication, structural, transposi- 
tion, restriction/modification, hypothetical, and other genes previously 
found in viruses for which the E score was at least 10~ 5 . Each individual 
virome contig was annotated by this technique (Fig. 1); however, if the 
best hit for any portion of the contig was to a gene with no known func- 
tion, lower-level hits were used as long as they had known functions and 
still met the E score cutoff. The annotation data were compiled for all of 
the subjects and used to determine the relative proportion of assembled 
contigs that contained viral homologues. The phylum of the cellular hosts 
for each annotated contig was used to determine taxonomic distributions 
in each subject and at each biogeographic site. The relative abundances of 



virus families were determined by BLASTX analysis of the SEED database 
with MG-RAST (72). Analysis of shared homologues in each virome was 
performed by creating custom BLAST databases for each virome, com- 
paring each database with all other viromes by BLASTN analysis (E score, 
<10~ 10 ). Heat maps were generated on the basis of shared homologues 
across all of the subjects and depicted with IAVA Treeview (73). Heat map 
data were normalized on the basis of the total number of viral contigs for 
each virome. PCOA was performed on homologous virome contigs by 
using binary Sorensen distances and Qiime (74). We also used a separate 
technique for assembly by constructing global assemblies from all of the 
reads from all of the subjects and all of the time points with 98% identity 
over a minimum of 50% read overlap. The contribution of each subject 
and time point to each assembly was assessed and used to construct heat 
maps with lava Treeview (73). 

Analysis of 16S rRNA. Genomic DNA was prepared from saliva or 
pooled subgingival or supragingival plaque from each subject with the 
Qiagen QIAamp DNA minikit (Qiagen, Valencia, CA). Each sample was 
subjected to a bead-beating step prior to nucleic acid extraction with 
Lysing Matrix-B (MP Bio, Santa Ana, CA) . We amplified the bacterial 1 6S 
rRNA V3 hypervariable region with the forward primer 341F (CCTACG 
GGAGGCAGCAG) fused with the Ion Torrent Adaptor A sequence and 
one of 23 unique 10-bp bar codes and reverse primer 514R (ATTACCGC 
GGCTGCTGG) fused with the Ion Torrent Adaptor PI from each subject 
and biogeographic site (75). PCRs were performed with Platinum PCR 
SuperMix (Invitrogen, Carlsbad, CA) with the following cycling parame- 
ters: 94°C for 10 min, followed by 30 cycles of 94°C for 30 s, 53°C for 30 s, 
and 72°C for 30 s and a final elongation step of 72°C for 10 min. Resulting 
amplicons were purified on a 2% agarose gel stained with SYBR Safe 
(Invitrogen, Carlsbad, CA) with the MinElute PCR purification kit (Qia- 
gen, Valencia, CA). Amplicons were further purified with AMPure beads 
(Beckman Coulter, Brea, CA), and molar equivalents were determined for 
each sample with a Bioanalyzer 2100 HS DNA kit (Agilent Technologies, 
Santa Clara, CA). Samples were pooled into equimolar proportions and 
sequenced on 3 14 chips with an Ion Torrent PGM according to the man- 
ufacturer's instructions (Life Technologies, Grand Island, NY) (55). Re- 
sulting sequence reads were removed from the analysis if they were 
<130 nt, had any bar code or primer errors, contained any ambiguous 
characters, or contained any stretch of >6 homopolymers. Sequences 
were assigned to their respective samples on the basis of their 10-nt bar 
code sequences and analyzed further with Qiime (74). Briefly, represen- 
tative OTUs from each set were chosen at a minimum sequence identity of 
97% with UClust (76) and aligned with PyNast (77) against the Green- 
Genes database (69). Multiple alignments then were used to create phy- 
togenies with FastTree (78), and taxonomy was assigned to each OTU 
with the RDP classifier (79, 80). PCOA was performed on the basis of beta 
diversity by using weighted UniFrac distances (81). Qiime was also used to 
calculate Shannon diversity indices. 

Statistical analysis. To assess whether viromes had significant overlap 
within or between subjects or subject groups, we performed a permuta- 
tion test based on resampling (10,000 iterations). We simulated the dis- 
tribution of the fraction of shared virome homologues from different 
subjects, biogeographic sites, or oral health statuses that were randomly 
chosen across all of the subjects and sites. For each set, we computed the 
summed fraction of shared homologues by using 1,000 random contigs 
between randomly chosen subjects or subject groups and from these com- 
puted an empirical null distribution of our statistic of interest (the frac- 
tion of shared homologues). The simulated statistics within each subject 
or group of subjects was referred to the null distribution of intersubject or 
intergroup comparisons, and the P value was computed as the fraction of 
times the simulated statistic for the each exceeded the observed statistic. 
An identical analysis was performed at the OTU level for the 16S rRNA 
taxonomic assignments. This technique was also used to assess the relative 
contributions of individual subjects and time points to global virome 
assemblies constructed from the reads of all of the subjects at all of the 
time points. We assessed whether any randomly selected contig had a 
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higher proportion of intrasubject reads than intersubject reads recruited 
in its assembly. 

Nucleotide sequence accession numbers. The virome and 16S rRNA 
sequences obtained in this study are available for download in the MG- 
RAST database (http://metagenomics.anl.gov/) under the project Phage 
Biofilm Study or under consecutive individual accession numbers 
4547358.3 to 4547405.3 for the viromes and 4547630.3 to 4547677.3 for 
the 16S rRNAs. 
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